Spontaneous binding of single-stranded RNAs to RRM proteins visualized by unbiased atomistic simulations with a rescaled RNA force field

Abstract Recognition of single-stranded RNA (ssRNA) by RNA recognition motif (RRM) domains is an important class of protein–RNA interactions. Many such complexes were characterized using nuclear magnetic resonance (NMR) and/or X-ray crystallography techniques, revealing ensemble-averaged pictures of the bound states. However, it is becoming widely accepted that better understanding of protein–RNA interactions would be obtained from ensemble descriptions. Indeed, earlier molecular dynamics simulations of bound states indicated visible dynamics at the RNA–RRM interfaces. Here, we report the first atomistic simulation study of spontaneous binding of short RNA sequences to RRM domains of HuR and SRSF1 proteins. Using a millisecond-scale aggregate ensemble of unbiased simulations, we were able to observe a few dozen binding events. HuR RRM3 utilizes a pre-binding state to navigate the RNA sequence to its partially disordered bound state and then to dynamically scan its different binding registers. SRSF1 RRM2 binding is more straightforward but still multiple-pathway. The present study necessitated development of a goal-specific force field modification, scaling down the intramolecular van der Waals interactions of the RNA which also improves description of the RNA–RRM bound state. Our study opens up a new avenue for large-scale atomistic investigations of binding landscapes of protein–RNA complexes, and future perspectives of such research are discussed.


INTRODUCTION
From synthesis to degradation, RNA molecules are almost constantly bound to proteins in vivo, forming ribonucleoprotein complexes (1)(2)(3)(4)(5). The interactions are very variable and dynamic, and a single RNA transcript binds to numerous proteins during its lifetime (6). Understanding the principles and function of protein-RNA recognition is therefore a cornerstone of RNA biology. Individual proteins can specialize in binding flexible single-stranded RNA (ssRNA) as well as structured RNAs such as duplexes or other recurrent motifs (2,(7)(8)(9). Bases within ssRNA are fully exposed for a readout which is commonly utilized by proteins whose biological function involves a targeted response towards specific RNA sequences (10). Among such proteins, the RNA recognition motif (RRM) domain is the most prominent, being the most widespread RNA-binding motif in eukaryotes (11). Individual RRM domains can recognize various RNA sequences despite sharing a very conserved fold in which two ␣-helices are packed against an antiparallel ␤sheet surface (12). The partially exposed ␤-sheet surface is a canonical RNA-binding RRM site. However, other parts of the RRM, such as the ␣-helices, can also be utilized for RNA binding (12). The specificity of RRMs towards different RNA sequences can be astonishingly tuned by subtle variations of their surface-exposed amino acids (13)(14)(15)(16)(17). The basic principles of RNA-RRM interactions were elucidated by nuclear magnetic resonance (NMR) and/or Xray crystallography techniques. However, complete understanding of the RNA-RRM recognition would require information extending beyond the static ensemble-averaged picture of the bound state, i.e. a description of the RNA-RRM interactions as a dynamic ensemble of conformations, which form with different populations and lifetimes. Knowledge of the complete free energy binding landscape would allow unraveling how the proteins search for their RNA targets in the cellular pool of biomolecules and how different RNA sequences compete with each other for the binding.
Molecular dynamics (MD) simulations are a method for describing the Boltzmann distribution (populations) of molecules using carefully calibrated empirical potentials, commonly known as force fields. MD has been successfully applied many times to study nucleic acid systems, including the protein-RNA complexes (18)(19)(20). MD provides potentially unlimited spatial and temporal resolution unrivalled by any experimental method and can significantly expand upon the information obtained from ensembleand time-averaged experimental structural data. However, in practice, MD studies are also affected by fundamental limitations--the quality of the force field and the affordable time scale (sampling). The two issues are interconnected as force field inaccuracies are routinely discovered once longer time scales become computationally affordable (18). A notorious problem of biomolecular force fields designed to simulate folded states is excessive compaction of unstructured ensembles (21). This makes the description of ssRNAs challenging, as illustrated by studies of RNA tetranucleotides (TNs) (18,(22)(23)(24)(25)(26) and tetraloops (27)(28)(29)(30)(31). In the case of TNs, the standard AMBER RNA force field leads to large populations of compact intercalated structures, which is not in agreement with experiments (32)(33)(34). For TLs, the excessive interactions in the unfolded states frustrate the free energy landscape and hinder folding. The solution to this problem is not trivial, as simply tuning the force field for a better description of small systems often leads to deterioration of the description of larger folded RNAs (26). A possible solution is introducing additional force field terms orthogonal to the currently used parameters, such as the gHBfix potential which tunes the strength of hydrogen bonds (26,35,36). Non-bonded-fix (NBfix) correction, a pair-specific adjustment of Lennard-Jones (LJ) combination rules, is another common approach to increase the parameterization flexibility (30,37,38). An alternative option is to abandon the daunting task of a general force field reparameterization and instead adjust the parameters in a goal-directed manner for specific systems/processes (39)(40)(41)(42). This strategy is common in coarse-grained modeling (18).
The prime goal of our current work was to capture spontaneous binding events of ssRNAs to their binding sites on the surface of RRMs, to complement the simple twostate picture of binding by visualization of a representative number of continuous binding pathways. However, we found out that it is virtually impossible to observe any binding events with the standard atomistic RNA OL3 AMBER force field (43) that is designed to simulate folded RNAs. The free ssRNAs rapidly and irreversibly collapsed due to RNA self-interactions, preventing the binding. Thus, we had to develop a suitable force field modification that would reduce the spurious RNA self-interactions. We directly address this problem via NBfix by rescaling the well-depth value (ε) of the LJ potentials between selected RNA-RNA atomic pairs. We abbreviate the new modification as stafix (stacking fix) as it is aimed at weakening the stacking and van der Waals (vdW) interactions as opposed to hydrogen bonds.  (44,46). (A) 3D representation of HuR RRM3 (lime) complexed with 5 -UUUUU-3 RNA. Nitrogen, carbon, oxygen and phosphorus atoms in the RNA are in blue, yellow, red and brown, respectively. The RNA backbone is indicated by an ochre tube. The individual binding pockets are labeled and shown as colored transparent molecular surfaces. The annotation of the bound RNA sequence is color-coded, with black indicating unbound nucleotides. Pockets p2 and p3 recognize only uracils and are well defined, whereas pockets p1 and p4 show a significant degree of disorder and can also accept adenines (44). (B) SRSF1 RRM2 (pink) complexed with 5 -AGGAC-3 RNA. All three pockets are well defined.
Stafix entirely eliminated occurrence of the spurious ssRNA self-interactions and allowed us to perform almost a millisecond of standard (unbiased) MD simulations where we documented a few dozen spontaneous binding events of short ssRNA sequences to two different RRM proteins--HuR RRM3 and SRSF1 RRM2. The RRM3 domain of HuR (44,45) ( Figure 1A) recognizes either Urich or AU-rich segments of mRNA in a canonical mode typical for RRM domains (11), and its crystal structure showed that it can bind four or five nucleotides (44). However, a large degree of disorder was detected at its protein-RNA interface, with only pockets p2 and p3 being always resolved within the asymmetric unit of the X-ray structure (44). In contrast, SRSF1 RRM2 ( Figure 1B) recognizes the GGA triplet motif in mRNA via well-defined binding pockets (46). Here, for the first time, we sample extensively (albeit obviously not completely) the binding funnel of both complexes and conclude that differences between the native protein-RNA interfaces are already reflected in details of their binding pathways. Namely, HuR RRM3 utilizes a previously unknown 'pre-binding mode' which navigates the binding and then facilitates fast binding register exchanges without fully separating the RNA from the protein. By separating the register exchange (i.e. sliding) from the native binding mode, HuR RRM3 might be able to efficiently scan the RNA for its target sequence. In contrast, SRSF1 RRM2 does not seem to utilize any pre-binding mode and displays very slow binding register exchanges.
The stafix modification was absolutely essential to execute our study as otherwise the RNA molecules strongly prefer binding-incompetent spurious self-interactions rather than sampling potential binding sites on the proteins' surfaces. Even if real ssRNAs sample to a certain extent the compact structures (47) which we prevent by stafix, they would probably be off-pathway intermediate states with respect to the binding process. Thus, stafix is justifiable and should not fundamentally distort the studied process. Importantly, by suppressing the spurious RNA intramolecular interactions, stafix also significantly stabilizes native protein-RNA interfaces of fully bound protein-ssRNA complexes, which further substantiates our approach.
In summary, we provide the first atomistic insight into the complex multiple pathway processes of spontaneous binding of ssRNA to the RRM proteins and suggest that RRM proteins utilize very diverse binding mechanisms to communicate with their RNA targets.

Selection of initial structures
We used X-ray structure of the HuR RRM3 protein-RNA complex (PDB: 6GC5) (44) as the starting structure for simulations with either 5 -UUUUU-3 (chains A and E) or 5 -UUUA-3 (chains C and G) RNAs bound. Coordinates of the U1 nucleotide in the 5 -UUUUU-3 system were obtained from chain G after first aligning the protein chains A and C to overlap. Coordinates of the nontarget 5 -CCCCC-3 RNA were obtained by mutating the 5 -UUUUU-3 sequence. The structure of the free HuR RRM3 was prepared by removing the bound RNA. For the simulations of the SRSF1 RRM2 complex, we used the first frame of its NMR structure (PDB: 2M8D) (46). To increase the sampling efficiency, the simulated RNA sequence was shortened from 5 -UGAAGGAC-3 to 5 -AGGAC-3 and we removed the unstructured N-terminal chain of the RRM2 (residues 106-119). For simulations of the A-RNA duplex, we used the r(CGCGGGAUUUCCCGCG) structure prepared with Nucleic Acid Builder (NAB) (48). Simulations of the neomycin-sensing riboswitch were started from the first frame of its NMR structure (PDB: 2N0J). (49) Simulations of the free r(UUUUU) and r(UUUA) were started from the structure of the HuR RRM3 complex (44) with the protein removed. Initial structures for simulations of r(AAAA), r(CAAU), r(CCCC) and r(UUUU) TNs and r(UCAAUC) hexanucleotide (HN) were prepared using NAB, corresponding to an A-RNA duplex structure with the complementary strand removed.
In all the spontaneous binding simulations (SBSs), the bound RNA was manually shifted away from the protein prior to the system building so that the distance between the two molecules was ∼20Å. We suggest that this distance is entirely sufficient for randomizing RNA's internal structure as well as its position with respect to the protein prior to its first spontaneous and random contact with the protein.
Nevertheless, for some HuR RRM3-r(UUUUU) SBSs, we also tested a second starting structure with the RNA placed at a different manually picked initial position but at the same distance (∼20Å) from the protein (Supplementary Figure S1). The conformation of the RNA in this second starting structure corresponded to an A-RNA helix with the complementary strand removed (prepared by NAB) instead of the protein-bound conformation which was utilized in the first starting structure. The simulations gave qualitatively the same results irrespective of which starting structure was used.

System building and simulation protocol
We used the xLeap module of AMBER 20 (48) to prepare the coordinate and topology files. The RNA was described by the bsc0 OL3 (i.e. OL3) force field (recommended firstchoice AMBER RNA force field) (43). For simulations of TNs and HN, the OL3 force field was additionally combined with modification of phosphate vdW parameters (50) and adjusted backbone dihedrals (51,52) (so-called OL3CP version). For simulations of the free r(UUUUU), we also tested the Chen-García (30) and DESRES (53) RNA force fields. In most simulations, the protein was described using the ff12SB force field which is the earlier version of the ff14SB (54). The ff12SB provides a better performance due to one specific dihedral term which significantly alters rotational kinetics of aromatic side chains. This term is present in the ff12SB, but is missing in both ff14SB and ff19SB, which can cause problems in description of some protein-RNA interactions. So far, this has been only sparsely noted in the literature (55)(56)(57)(58). The problem is now defined and documented in this work, by comparing the performance of the ff12SB, ff14SB and ff19SB force fields for the isolated HuR RRM3 protein.
Prior to simulation, all systems were surrounded in an octahedral box of either SPC/E (59) or OPC (60) water molecules with a minimal distance of 12Å between the solute molecules and the box border. Only the OPC (60) water model was used in conjunction with the ff19SB protein force field (61). For TNs and HN, the cubic water box was utilized. An excess salt ion concentration of 0.15 M was established by randomly adding KCl ions (62) around the solute. The minimization and equilibration was performed with the pmemd.MPI module of AMBER 20 using the protocol described in (63). Afterwards, the production simulations were run using the pmemd.cuda (64) on RTX 2080ti graphic cards for a standard length of 10 s. Multiple simulations of each system were run, with different trajectories obtained by assigning initial atomic velocities using a random seed number at the beginning of each simulation. SHAKE (65) and hydrogen mass repartitioning (66) were used in all simulations, permitting a 4 fs integration step. Long-range electrostatic interactions were described with the particle mesh Ewald scheme (67) with periodic boundary conditions applied to handle the system border bias. The cut-off distance for non-bonded LJ interactions was 9 A. We used a Langevin thermostat and Monte Carlo barostat (48). Some of the TN and HN simulations additionally included the gHBfix (26) and tHBfix (35) potentials which selectively modify the stability of specific H-bond types. We also performed one enhanced sampling simulation of the r(UUUUU) using the standard replica exchange solute tempering (REST2) scheme (68). Additional simulations of this RNA were also performed with the AMOEBA polarizable force field (69) or with increased volume of the simulation box (see Supplementary Information for more details).

Stafix modification--rescaling the LJ interactions of RNA
After creating the topology files in xLeap, we introduced off-diagonal terms in the LJ matrix to modify selected nonbonded RNA-RNA potentials; an approach commonly referred to as NBfix. This was done automatically by a custom-made python script (see the Data Availability statement) utilizing the parmed module of AMBER 20. In the case of protein-RNA simulations, it required that unique atom types were defined for all RNA atoms which shared their LJ parameters with the protein. Afterwards, selected RNA-RNA pairwise LJ potentials ( Figure 2) were modified in the topology files so that the well-depth value (ε) of their LJ potentials was reduced by a specific scaling factor. Two different scaling factors were tested. The modification is aimed at suppressing excessive non-specific RNA selfinteractions such as the base-base, base-phosphate, sugarbase and sugar-sugar stacking. As such, we refer to the modification as the 'stacking fix' (abbreviated as stafix). For technical reasons, we avoided rescaling the LJ potentials related to H-bonds. See Supplementary Information for full details.

Analyses
The cpptraj module of AMBER and VMD were used for post-processing, analyses and visual inspection of all simulations (70,71). Povray and gnuplot were used to prepare molecular figures and graphs, respectively. For simulations of systems where we tested both the OPC and SPC/E water models, the data shown in the figures refer to systems where SPC/E was used unless stated otherwise. For graphs showing time development data, a single simulation was selected as the data source unless stated otherwise. For the histogram analyses, a combined simulation ensemble of all simulations of the given type was used, the bin size was set to 10 and the distribution was normalized. The solvent-accessible surface (SAS) of RNA and the size of protein-RNA interfaces were calculated using the LCPO method (72). The bound protein was excluded in RNA SAS calculations. For visualization of the molecular surfaces with VMD, we used a probe size radius of 1.4Å.
To monitor the binding process and exchanges of bound nucleotides, we have used the following approach to identify which nucleotides occupied the specific binding pockets in simulations of HuR RRM3 and SRSF1 RRM2. We first defined the list of protein heavy atoms located within 7Å of the geometrical centers of the base aromatic rings within each pocket of the native experimental structure. The presence of individual nucleotides within the pocket in all simulation frames was then evaluated based on the distance between the geometrical center of the base aromatic ring and the geometrical centers of the previously defined lists of protein atoms. For HuR RRM3, a specific nucleotide was considered to be present in the pocket when the distance was <7 A for the pockets p1, p2 and p4, and 5.4Å for pocket p3. The shorter distance for pocket p3 was empirically chosen as greater values often led to the occupancy of this specific pocket being incorrectly assigned. For SRSF1 protein, the distance criterion was 7Å for all three pockets. Note that this analysis of binding was based solely on empirical distance conditions and did not evaluate other criteria, such as formation of the individual protein-RNA H-bonds. Nevertheless, based on careful visual inspection, we suggest that it provides a fairly representative albeit approximate description of the highly dynamic binding process.
The native contacts analysis was performed by cpptraj, with non-hydrogen interatomic distances in the starting structure <5Å being considered native. For the A-RNA duplex system, the base pair, base pair step and helical parameters were analyzed by x3DNA (73) and cpptraj using default settings. MD conformational ensembles of TNs and HN were compared with available data from NMR solution experiments (32)(33)(34); see Supplementary Information for details. The dominant conformations of TNs and HN were identified by clustering (74) combined with the εRMSD metric (75); see also (35).

RESULTS
We present results of over a millisecond of standard (unbiased) MD simulations performed with the AMBER RNA OL3 force field (43) and those where we additionally combined OL3 with the stafix correction to eliminate the RNA overcompaction (henceforth referred to as OL3-stafix force field; see also Table 1 and Supplementary Table S1). The prime goal was to investigate spontaneous binding of the ssRNAs to the HuR RRM3 and SRSF1 RRM2 domains from the unbound state. The first key observation is that the RNA conformations recognized by HuR RRM3 and SRSF1 RRM2 are not populated in simulations of the unbound RNA sequences with the OL3 force field due to excessive RNA self-interaction so that the ssRNA is unable to sample conformations competent for binding. As explained in the Introduction, it is an artifact typical for force fields parametrized to simulate folded biopolymers. This undesired behavior was corrected with the OL3-stafix force field which allows routine observation of spontaneous conformational captures of the free RNAs by the proteins, atomistic details of which are for the first time described in this work. OL3-stafix also dramatically improves the stability of the protein-RNA interface interactions when simulating the fully bound complex. Finally, although OL3-stafix was primarily developed as goal-specific force field correction for protein-RNA complexes capturing unstructured RNAs, we document its effects also in simulations of TNs and A-RNA duplexes.

OL3-stafix eliminates formation of spuriously compacted structures of free RNAs seen with the standard OL3 and some other force fields
MD simulations of the free 5 -UUUUU-3 and 5 -UUUA 3 RNAs, which are both the target sequences of HuR RRM3 ( Figure 1A), showed a gradual but ultimately permanent collapse as the RNA self-interactions accumulate. We term the resulting RNA structures as 'globules', being best characterized by their compact globular shape (and a reduction of SAS by ∼20%; Figure 3; Supplementary Figure S2). This overcompaction is consistent with RNA force field problems reported in the literature (18,(22)(23)(24)(25)(26)(32)(33)(34)53) and we observe this effect in standard MD simulations using the OL3 (43), as well as DESRES (53) and García (30) RNA force fields. The same is observed for the OL3 also in the REST2 enhanced sampling simulation (Supplementary Figure S3), suggesting a relatively good convergence of the standard MD. We note that the RNA globule is not  formed with the polarizable AMOEBA force field (Supplementary Information) which is consistent with earlier tests of the AMOEBA force field on TNs (69).
Having formed the globule, the target RNAs are no longer competent to form interactions with HuR RRM3 as these require an extended RNA chain where the individual bases are accessible for readout. Application of OL3-stafix with either a 0.5 or a 0.1 scaling factor prevents formation of the RNA globule and maintains the extended-like structure of the free RNA strand, which is accessible for the solvent, and in turn for the protein (Figure 3). OL3-stafix can also swiftly dissolve globules which had already formed (Supplementary Figure S3). We note that the OL3-stafix force field does not rigidify the ssRNA, which we suspect could occur if we tried to use dihedral reparametrizations to prevent the overcompaction.  Table S1). The transparent gray blobs indicate solvent-accessible molecular surfaces. The standard AMBER OL3 force field forms a compact globule (left), while a more extended conformation can be maintained with both versions of the OL3-stafix force field (middle and right). (B) Time development of the RNA solventaccessible surface (SAS). (C) Histograms of the data from (B). The SAS value corresponding to the protein-bound conformation (vertical dashed line) is inaccessible with OL3 but regularly visited with the OL3-stafix force field, allowing initiation of the conformational capture mechanism.

Formation of RNA globule also disrupts bound HuR RRM3-5 -UUUUU-3 complex
The structural collapse and SAS reduction seen for the free 5 -UUUUU-3 RNA with the standard OL3 force field ( Figure 3) also affects the RNA molecule when bound to HuR RRM3 ( Figure 4). Both experiments and MD simulations have shown that the RNA stably binds in pockets p2 and p3, while binding in pockets p1 and p4 is partially disordered [see Figure 1A; (44)]. The partial disorder makes the system prone to accumulating RNA self-interactions which subsequently degrade its interface ( Figure 4A) by displacing the native protein-RNA H-bonds and vdW interactions. In other words, with the standard force field, the system rather quickly departs from sampling the partially disordered states and forms highly stable RNA self-interactions that do not allow the nucleotides to return to their binding pockets ( Figure 4D). This clearly contradicts the experimental data (44). Permanent formation of such RNA self-interactions is entirely prevented by both stafix variants ( Figure 4; Supplementary Figure S4). The OL3-stafix simulations still capture the experimentally documented disordered nature of RNA binding in pockets p1 and p4 as numerous and extensive reversible disruptions are observed on the simulation time scale. However, elimination of the spurious RNA self-interactions allows full restoration of the native binding, contrasting the simulations with the standard OL3 force field. See Supplementary Information for full details.

Spontaneous protein-RNA binding events of 5 -UUUUU-3 and 5 -UUUA-3 to HuR RRM3 on a 10 s time scale
We have used the OL3-stafix force field in 29 simulations of 10 s (Table 1), attempting to observe spontaneous (i.e. unguided by any pre-determined pathway or coordinate) binding events. The target RNA was placed at a ∼20Å distance away from the protein prior to the simulation start (see the Materials and Methods for details). We term such a calculation as spontaneous binding simulation (SBS). RNA globules were not observed in any of the OL3-stafix SBSs (Supplementary Figure S5) and, after a period of free diffusion through the water box, the RNA was successfully captured by the protein in its native binding mode in 23 (∼80%) simulations (Table 1; Supplementary Figure S6) We classify a successful binding event when most of the native binding interface is established for the rest of the simulation, leading to qualitatively identical behavior to that in simulations started from a bound structure. It includes the stable binding in pockets p2 and p3 and partially disordered binding in pockets p1 or p4 (44). We generally observed that once the native binding mode was established in an SBS, it remained stable (albeit very dynamic) for the rest of the simulation, demonstrating a predictive power of the stafix-modified force field. Also, as shown in the Supplementary Information, using OL3-stafix does not abolish the selectivity of HuR RRM3 for its RNA targets. Binding of target RNA to HuR RRM3 is a multiple pathway multiple-step process with a substantial role of structural dynamics at each step Our SBSs of the HuR RRM3 system revealed diverse multiple pathways by which the RNA achieves native binding to the protein. Each step of the process was also fully reversible, with multiple back-and-forth transitions sometimes observed. Although each binding event was unique in its details, the typical process can be characterized with four distinct steps ( Figure 5). The first step involved one of the nucleotides randomly approaching and being captured by any of the four binding pockets. On a scale of tens to hundreds of nanoseconds, this was followed by the second step, which was formation of what we term a 'pre-binding state', characterized by extensive but unspecific intermolecular contacts, mostly between the ribose sugar rings of the RNA and hydrophobic side chains of the amino acids F247, I276, F287 and F279. The first three residues are part of the native RNA-binding interface while F279 is more distal and interacts with the RNA only in the pre-binding state. We suggest that the pre-binding state is an intermediate between the native binding and an unbound state (Fig-ure 5). The RNA is highly flexible in this state and samples many different interactions with the protein, including specific interactions with the residues of the native binding interface. On a scale of tens of nanoseconds, this eventually triggered the third step of the binding, in which the prebinding state transitioned into the native binding mode as multiple binding pockets became spontaneously occupied. Note that in the majority of cases, this initially resulted in a misaligned binding register (non-optimal binding). For example, the U1 nucleotide would occupy the p2 pocket which allowed binding of U2 and U3 in pockets p3 and p4, respectively, thus leaving the p1 pocket unoccupied. Therefore, the fourth step of the binding process was sliding of the target RNA sequence into a binding register where all proteinbinding pockets are occupied. The fourth step did not always occur on the time scale of the SBS and is described in detail in the next section.
The above-mentioned four steps roughly delineate the most common scenario of the binding events observed for the 5 -UUUUU-3 or 5 -UUUA-3 RNAs and also allow us to estimate the associated time scales. Namely, the first step was the most limiting as, without any guiding force, the RNA would sometimes spend microseconds exploring various non-native interactions with other parts of the HuR RRM3 surface or diffusing through the water box before first approaching the native binding pockets. In some of the SBSs, the first step did not occur on the 10 s time scale. It was the most common cause of failed SBSs, i.e. trajectories where no successful binding events were observed (Supplementary Figure S6, #6 #12). Even when the first step occurred, another limitation was that the RNA could be captured in a position unsuitable for forming the rest of the native interface, such as the reversed 5 -3 polarity of the RNA chain or the U1 nucleotide being captured in pocket p4 (not shown in Figure 5). In most such cases, the RNA would still eventually transition into the pre-binding state (second step) and proceed with the binding process. However, sometimes the RNA would not depart from these unproductive and rather off-pathway arrangements for the rest of the SBS (Supplementary Figure S6, #5 #18).
Trajectories where the first nucleotide was captured by pocket p4 were the most sensitive to this behavior as pocket p4 is relatively unselective and can accommodate nucleotides in multiple, even non-native, spatial orientations. This problem may have been exaggerated by suboptimal force field parameters of the sulfur atom of C245, which is part of pocket p4 ( Supplementary Information). Lastly, the sliding of the RNA into the full (optimal) binding register often did not occur on the trajectory's time scale and only part, albeit a majority, of the native interface remained occupied until the end in most SBSs (Supplementary Figure  S6, #1 #8 #48). We still considered SBSs with such nonoptimal binding interfaces as successful. As explained below, we actually suggest that the native bound state of 5 -UUUUU-3 consists of two optimal and two non-optimal binding registers ( Figure 5).

Structural basis of the 5 -UUUUU-3 RNA sliding within the binding register of HuR RRM3
In principle, the individual Us of 5 -UUUUU-3 RNA can equally well occupy any of the four binding pockets of HuR Nucleic Acids Research, 2022, Vol. 50, No. 21 12487 Figure 5. Scheme of the HuR RRM3-5 -UUUUU-3 complex formation as revealed by SBSs. The simulations start with the biomolecules separated and freely diffusing in water until one of the Us spontaneously binds in any of the pockets (first contact). This eventually brings the RNA into extensive, non-specific interaction with the protein surface surrounding the native binding pockets (a pre-binding state), which then leads to formation of the native complex interface in either non-optimal or optimal binding registers. The system can exchange between the registers by temporarily reverting into the pre-binding state. Note that the scheme is merely illustrative of the general principle and that each of the arrows actually represents multiple pathways. Similar binding pathways were followed by the 5 -UUUA-3 RNA which, however, has only one optimal and one non-optimal binding register. The solid red lines indicate nucleotide interaction with the pocket. Dashed red lines indicate local competition between two nucleotides for a single binding pocket. Blurred pocket labels indicate empty binding pockets. Blue and green colors of the blob indicate unbound and bound protein structures, respectively. RRM3 ( Figure 1A), which raises a question of how the optimal binding register is established. Indeed, our SBSs showed that the initial binding events often result in nonoptimal binding registers. Most commonly, three pockets were filled while one (p1 or p4) remained empty ( Figures  5 and 6A; Supplementary Figure S6) and, in many SBSs, this was the final state of the binding process observable on the 10 s time scale. However, there were several instances where the RNA later spontaneously shifted within its binding register to reach the optimal state with all four pockets filled (Supplementary Figure S6, #7 #9 #13 #22 #47). We refer to this process as sliding.
Similarly to the binding, the individual sliding events also proceeded in a unique fashion, indicating that it is a very multidimensional multiple-pathway process. Nevertheless, we can say that the sliding (register exchange) is preceded by a local competition between two nucleotides for a single binding pocket (Figures 5 and 6B), most commonly pockets p1 or p4, which are partially disordered and able to transiently accommodate two nucleotides simultaneously. This local competition can trigger a large perturbation of the native interface, in which the system temporarily transitions back into the pre-binding state ( Figure 6C). As previously mentioned, the RNA is very flexible while in the pre-binding state. It tends to quickly revert back towards the native binding state but, in some cases, it does so with a binding register shifted by one or more nucleotides ( Figure 6D), thus completing the sliding process ( Figure 6E). The register exchange dynamics appear to be a process regularly occurring on a 1-10+ s time scale.

Spontaneous binding of the SRSF1 RRM2-5 -AGGAC-3 complex
Unlike the partially disordered interface in the HuR RRM3 system, SRSF1 RRM2 recognizes the GGA triplet motif in mRNA via classical well-defined binding pockets (46). There is also no viable alternative binding register as the G2, G3 and A4 nucleotides of 5 -AGGAC-3 RNA can only fit into pockets p1, p2 and p3, respectively ( Figure 1B). We observed successful binding in five out of 18 SBSs (Table 1; Supplementary Figure S7), a noticeably lower success rate (∼30%) than for the HuR RRM3 system. This was mainly due to the abundance of non-native sites explored by the RNA on the SRSF1 RRM2 surface which sometimes prevented it from reaching the native binding interface on the 10 s time scale. In other words, the binding landscape of the SRSF1 RRM2-5 -AGGAC-3 complex appears to be considerably frustrated by off-pathway binding. The second reason was that there is no non-optimal binding register possible for the 5 -AGGAC-3 RNA and it needs to achieve the optimal binding register in the context of a single binding event instead of being able to reach it incrementally via suboptimal binding registers like in HuR RRM3. Likewise, there was no indication of any pre-binding state ( Figure  5) which could guide the RNA towards the native binding interface. Indeed, in the five successful SBSs, we always observed one of the nucleotides spontaneously binding in one of the native binding pockets, which then led to binding of the second and finally the third (Figure 7). Although relatively straightforward compared with HuR RRM3 (see above and Figure 5), the RNA binding to SRSF1 RRM2 Figure 6. Spontaneous 5 -UUUUU-3 RNA sliding on the surface of HuR RRM3 in one simulation. After the first initial contact with the protein at ∼0.24 s, the RNA bound in a non-optimal binding register with only three out of the four pockets filled (A). It was followed by many reversible local disruptions/fluctuations of the protein-RNA interface where U3 and U4 competed for the p4 binding pocket (B; red circle). This eventually led to a larger scale global disruption at ∼4.9 s where the RNA detached from the binding pockets but did not fully depart the protein's surface, entering the pre-binding state (C). The system would repeatedly fluctuate between the pre-binding and native states until ∼7.1 s when one of the global disruptions was followed by a sudden chain sliding by two nucleotides (D; ∼7.25 s) in the 5 direction and the native interface was (re)formed with all four binding pockets occupied, though still dynamically fluctuating (E). (F) Time development of the binding pocket occupancies by specific nucleotides; note that the binding occupancy descriptor does not fully visualize the dynamicity of the complex.
is still a multiple-pathway process as it does not appear to have any preference for the order in which the nucleotides are bound. The leading cause for failure among the unsuccessful SBSs was the RNA not reaching the native binding site within the simulation time scale (Supplementary Figure S7, #4 #12). Another common reason was one of the nucleotides binding in compatible but non-native binding pockets (i.e. A1 in p3 or G3 in p1; Supplementary Figure  S7, #5 #21). Such binding was relatively stable on the simulation time scale and stalled the RNA, preventing it from reaching the native binding mode. Importantly, there were no RNA globules observed in SBSs when using OL3-stafix (Supplementary Figure S8).
Interestingly, with OL3-stafix, we also observed improvements in simulations of the fully bound complex. Namely, in two such simulations using the standard OL3 force field, we observed the A4 breaking its native protein-RNA interactions and permanently detaching soon after starting and after ∼1.2 s, respectively, while forming extensive selfinteractions with neighboring nucleotides. In one of the simulations, the second pocket was similarly lost after ∼5 s. No such progressive degradation of the protein-RNA interface and RNA globule formation was seen in OL3-stafix simulations (Supplementary Figure S9).

The standard force field does not allow binding
For comparison, we also performed SBSs where we did not use stafix, i.e. with the standard OL3 force field. Although the RNA would sometimes sample the native binding pock-Nucleic Acids Research, 2022, Vol. 50, No. 21 12489  ets, successful binding occurred only in three SBSs out of 18 and only for the HuR RRM3-5 -UUUUU-3 system. Some binding attempts can rarely occur even for the RNA globule when some uracils become temporarily exposed on its surface while simultaneously being in the proximity of the native binding site. However, even then the binding is visibly frustrated compared with simulations with OL3-stafix due to competition from the RNA self-interactions which ultimately overpower the binding tendency and the binding is lost again. In other words, stable binding is achievable only with OL3-stafix. In all the other simulations, including the SRSF1 RRM2-5 -AGGAC-3 and HuR RRM3-5 -UUUA-3 systems, extensive formation of RNA globules entirely prevented spontaneous binding with the standard OL3 force field. Full details are given in the Supplementary Information.

Influence of water model and selection of the protein force field
We observed virtually identical performance when using the SPC/E (59) or OPC (60) water models in our simulations (Table 1; Supplementary Table S1), including the tendency to form RNA globules without stafix (Supplementary Figure S10) and the similar benefit provided by the modification. Therefore, we suggest that both water models are equally valid choices for SBS. In all our simulations, we used the ff12SB force field to describe the proteins. The reason for preferring this protein force field version over the newer ff14SB or ff19SB is the superior (for the present systems) description of the phenylalanine and tyrosine side chains provided by ff12SB (Supplementary Figure S11). These residues are prominent at the protein/RNA interfaces of RRM domains studied in this work (44,46) as well as in protein-RNA complexes in general. Full details are given in the Supplementary Information.

Simulations of tetranucleotides, hexanucleotide and A-RNA with OL3-stafix
OL3-stafix is a goal-specific force field modification in no way intended to be a multipurpose RNA force field. Still, we also examined its performance in simulations of TNs, HN and A-RNA duplex (Supplementary Table S1)--simple RNA motifs which are commonly used in force field benchmark studies. As expected, the use of OL3-stafix increased the RNA SAS of all simulated TNs and the HN. For the TNs, the stafix abolishes both the spurious intercalated and native A-form-like conformations in simulations, especially at higher scaling factors (Supplementary Table S2).
On the other hand, changes (compared with the standard OL3 force field) of the A-RNA duplex geometry in OL3stafix simulations were surprisingly minor (Supplementary Figures S12, S13 and S14; Table S3), especially when one considers how extensively stafix alters the system's potential energy function. There were no signs of the duplex helical structure perturbation with stafix 0.5 even on the 10 s time scale, although there was some increase in reversible terminal base pair fraying. Therefore, with caution, stafix could potentially also be applied for protein-RNA systems where RNA stem-loops are bound (e.g. the U1A RRM protein-RNA complex), especially when the interacting RNA segment is not internally structured. We reiterate that the stafix modification is not intended for simulations of folded RNAs. However, in protein-RNA binding studies, its applicability could be broader than just simulations with short ssRNA segments. Another possibility would be to apply stafix only on part of the RNA molecule (e.g. the loop part of the stem-loop). See the Supplementary Information for more details.

Spontaneous binding simulations of protein-RNA complexes: from two-state model to ensemble description
We have used MD simulations to explore completely spontaneous protein-RNA binding events for two simple protein-RNA interfaces (Table 1). Such calculations were previously accomplished for other systems either by using coarse-grained models or by molecular docking approaches (10,18,76), but not in the context of unbiased full atomistic MD. The present simulations reveal astonishingly rich and diverse multidimensional dynamics on time scales from nanoseconds to many microseconds. To achieve this, we developed a force field modification (called stafix) which addresses the tendency of unstructured ssRNAs to excessively self-interact in MD simulations when using AMBER force fields designed for folded RNA structures. Using this method, we were for the first time able to observe the entire binding process of ssRNAs to HuR RRM3 and SRSF1 RRM2 (44,46) proteins, from the free diffusion around the proteins through initial conformational capture to final formation of the native binding interfaces by induced fitting.

Binding of RNA to HuR RRM3 is a complex multiplepathway process with a partially disordered bound state
It can be said without exaggeration that no two binding events that we observed for the HuR RRM3 system proceeded in exactly the same fashion. Each binding event was unique. This demonstrates the value of standard (unbiased) MD simulations. To describe all nuances of the binding process using a limited number of collective variables would be an arduous task and the dimensionality reduction excessive (77). This was most obvious with the 5 -UUUUU-3 RNA which can bind to HuR RRM3 in multiple binding registers while the individual Us can in principle occupy all four binding pockets equally well. As a consequence, there are countless pathways between the unbound and bound states that the system can utilize during the binding process. Importantly, the multidimensionality is also a hallmark of the bound state. The bound complex is characterized by rich multidimensional thermal dynamics involving four registers and ranging from subnanosecond to (multi)microsecond time scales. The RNA is localized at the binding site but is semi-disordered. Lastly, we note that the binding rate constants estimated from our SBSs (Supplementary Table S4) are rather consistent with k on rate constants measured for another RRM protein-RNA system (78), suggesting that the speed of the binding in our simulations is not unrealistic.
In our HuR RRM3 SBSs, the binding typically started with one nucleotide randomly binding in one of the pockets which brought the rest of the RNA close enough to also sample the other pockets and eventually bind there. At this stage, we also observed an intermediate 'pre-binding state' where the native binding pockets are not yet filled but the RNA is already in extensive molecular contact with the protein surface proximal to the native binding interface. This contact typically involved RNA ribose rings and four amino acid side chains which are evolutionarily conserved as hydrophobic in the RRM3 domains of the Hu family of pro-teins (79). This included F279 which did not otherwise interact with the natively bound RNA. Part of its evolutionary role could therefore be to serve as the RNA binding platform in the pre-binding state.
The initial binding events almost always resulted in a non-optimal binding register where not all the binding pockets were filled ( Figure 5). In a few simulations, we later saw the RNA spontaneously shifting into the optimal binding register, i.e. sliding ( Figure 6). The sliding events were precipitated by the RNA temporarily transitioning back into the pre-binding state before returning into the native binding state with an exchanged register ( Figure 6). We suggest the HuR RRM3 might be able to rapidly scan the target RNA sequence in this fashion. Firstly, the RNA prebinding state (Figure 5), in which the RNA is very flexible compared with the native binding state, allows sliding without the need to fully separate the RNA from the protein surface. This is advantageous as full separation of the two biomolecules would dramatically decrease solvent entropy (80,81). The pre-binding state is evidently separated only by a relatively small free energy barrier from the native binding state, allowing it to be regularly visited on a microsecond time scale. It thus can be considered as a genuine part of the binding funnel indistinguishable from the bound state at lower temporal resolutions. The basic principle of binding funnel expansion and lowered ruggedness of the binding landscape is reminiscent of, for example, the fly-casting mechanism (82) of protein-DNA association. However, in our particular case, the essential element of binding is the on-pathway pre-binding state. The rich dynamics are further greatly aided by the partial binding disorder of pockets p1 and p4, which facilitates competition between neighboring nucleotides and results in a very flat multiple-minima and multidimensional free energy surface of the bound state ( Figure 6).
Secondly, when the sliding occurs with a single concerted movement in the context of the diffusive pre-binding state, HuR RRM3 avoids the speed limitations of having to proceed one nucleotide at a time. Although such a process of sliding 'pocket after pocket' is in principle possible, a complete sliding event would require all five nucleotides exchanging in the same direction (either 5 to 3 or 3 to 5 ). This 'linear diffusion' would be controlled (hindered) by multiple barriers, leading to a quite slow exchange rate unusable in biological context; it is often referred to as the speed-stability paradox of 1D diffusion ( Figure 8A) (83). It is the presence of the pre-binding state that circumvents this and allows the sliding.
Thirdly, a two-state model of the bound state with the native state (where sliding is slow) and the pre-binding state (where sliding is fast) means that the protein may effectively separate affinity (or binding capability) from specificity, and is able to separately tune both during evolution. The essence of the sliding mechanism employed by HuR RRM3, i.e. local disruption followed by quick global disruption followed by the register shift and restoration, resembles an earlier described translocation mechanism of the WRKY domain protein along the DNA helix (84). We suggest it could be a general strategy employed by proteins specialized in efficient scanning of long nucleic acid sequences without an external driving force. In the case of HuR RRM3, it would allow it to rapidly search for the RNA sequence whose binding corresponds to thermodynamic minimum while efficiently screening against utterly incompatible sequences (such as the poly(C) RNA; see the Supplementary data) by sequestering them in the non-specific and relatively weak pre-binding state.
In summary, binding of ssRNA to HuR RRM3 is a textbook example of a process whose full understanding requires the ensemble description (85,86). Such a description should be considered in MD simulation studies, as well as in interpretations of experiments (44,45).

Binding to HuR RRM3 in the context of multiple or fulllength mRNAs
Similarly to most in vitro experimental studies, our present work does not consider how competition between multiple RNAs for the same binding site would impact the suggested binding mechanism ( Figure 5). Further, the length of our simulated RNA chain is short. Protein-RNA interactions involving the flanking nucleotides in longer mRNA segments could generally affect the binding and potentially help to select the binding register. We aim to be able to perform such calculations in the future but additional force field modifications may be required to make them feasible, which is already outside the scope of this study. Nevertheless, the present data along with earlier published experimental studies (44,45) allow us to formulate several hypotheses.
Firstly, it has been shown that the ideal target sequence for HuR RRM3 is 5 -AUUUA-3 (44). This sequence can bind in two binding registers (with either the A1 or A5 nucleotide in pockets p1 and p4, respectively), which probably provides an entropic advantage. Based on our simulations, we suggest that the pre-binding state ( Figure 5) might represent a low-barrier pathway between the two binding registers ( Figure 8B), giving rise to a very broad free energy minimum when binding the 5 -AUUUA-3 and stalling the conformational search at this sequence.
Secondly, we suggest that besides a linear scanning, a strand replacement mechanism scenario may play a role, in which a bound RNA would be displaced by another RNA or by a sequence upstream or downstream of the bound nucleotides ( Figure 8C). Although this process cannot be simulated at the moment, we suggest that the observed partially disordered nature of pockets p1 and p4 and their ability to temporarily accommodate two nucleotides ( Figures 5 and  6) would facilitate such a scenario. By providing an inherent bridgehead for the incoming RNA, the speed of the exchange in favor of a strand with thermodynamically more stable binding would be accelerated rapidly, akin to the toehold or end fraying mechanisms well known for strand displacements in DNA and RNA duplexes ( Figure 8C) (87)(88)(89), or the active cycling models of RNA exchange on the surface of the bacterial Hfq protein (57,90,91).

Binding strategy of SRSF1 RRM2 is different from that of HuR RRM3
Our SBSs identified no pre-binding state for the SRSF1 RRM2 system. As a consequence, the SBS success rate was lower for this system (Table 1) as the conformational capture and the induced fit needed to form the native interface had to occur in essentially a single uninterrupted 'all-ornone' process (Figure 7), which presents a sampling limitation. Since there were no regular sliding events observed, the correct binding register also had to be established on the first try. For example, we could observe a situation where the second G of the GGA triplet bound in the first pocket. Without sliding, the only way to resolve such a situation was full separation and another binding attempt which was very slow and often did not occur on the 10 s time scale of our simulation. We cannot rule out that this behavior could be a consequence of using a short RNA which contains only a single GGA triplet. In full mRNA, the competition from the other nucleotides might be used to dislodge the out-ofthe-register binding (see, for example, Figure 8C). However, we suggest that our simulations might also reflect genuinely different evolutionary requirements placed on the SRSF1 RRM2 and HuR RRM3 proteins. In any case, the simulations suggest that different proteins may use diverse mechanisms and pathways even for the at first sight straightforward ssRNA binding, indicating that there is no universal mechanism by which RRMs search for their RNAs.

Stafix potential prevents formation of RNA globules and enables spontaneous protein-RNA binding
The inability to simultaneously describe folded and disordered (unstructured) biopolymers is an Achilles' heel of contemporary biomolecular force fields (21). In this work, it is showcased by the ssRNA collapsing into hyperstable globules which outcompete protein-RNA interactions, making such ssRNA unsuitable for protein-RNA binding (Figure 3). The RNA seemingly prefers selfinteractions over any other form of intermolecular contact, so much so that the RNA self-interactions gradually outcompete and disrupt native protein-RNA interactions even in the initially bound complexes (Figure 4; Supplementary  Figures S4 and S9). In addition, a high population of the RNA globule was observed even in the REST2 enhancedsampling simulation, which corroborates the notion that the global OL3 force field thermodynamic minimum probably corresponds to the RNA self-interactions (Supplementary Figure S3).
We addressed this problem with a force field modification altering the stability of RNA-RNA vdW interactions via rescaling selected pairwise LJ potentials ( Figure 2); technically using an approach that is known as non-bonded fix (NBfix) (92). NBfix is very effective for separate tuning of interactions which cannot be optimally described using a single set of LJ parameters (30,38,(93)(94)(95)(96)(97). We call the new modification stafix (stacking fix) as it primarily addresses the overstabilization of stacking present in the standard OL3 force field. Simulations with OL3-stafix can be used to observe spontaneous binding as well as to stabilize protein-RNA interfaces (Table 1). However, it could also be of interest in simulations of free ssRNAs (Supplementary Information). We also note that besides OL3, the stafix approach can be straightforwardly combined with other RNA force fields (e.g. DESRES and García) where it similarly suppresses the RNA globules (Supplementary Figure S3). In addition, the stafix factor (i.e. the magnitude of the reduction of the vdW interactions) is a parameter that can be empirically adjusted for each specific RNA force field.
The suggested design of stafix ( Figure 2) is a result of multiple iterative trial-and-error attempts. Our original intent was to only weaken the excessive base-base stacking which we suspected was the chief factor in formation of the spurious RNA self-interactions (18,98). This initial approach proved successful in lowering the amount of base stacking; however, it immediately unmasked similarly excessive sugar-sugar, sugar-base as well as base-phosphate stacking and vdW interactions, an observation consistent with earlier works (38,95,97). Thus we ultimately extended the list of rescaled atoms to weaken stacking (or vdW) interactions of the entire RNA. Note that we do not suggest OL3-stafix as a general force field but rather a task-oriented modification focused on achieving two specific goals. Firstly, it represses RNA globules and provides ssRNA ensemble populations of the extended unstructured conformation that is commonly recognized by the proteins but rarely populated with the standard force field. Secondly, it speeds up RNA's conformational transitions by smoothing its folding landscape. It actually could be a biologically relevant description for the short ssRNAs (99), rather than the likely excessively rugged folding landscape predicted by the standard force field (18,100). Fine-tuning the individual details in design of stafix ( Figure 2; Supplementary Information) is relatively unimportant as long as the overall cumulative effect is achieved with minimum or no side effects for a given set of goals (we reiterate that our goal was not to create a universally applicable core RNA force field). We obviously do not recommend the use of OL3-stafix for structured RNAs. In fact, our preliminary tests on the neomycin-sensing riboswitch, which contains a highly structured U-turn motif (49), indicated a suboptimal performance with OL3-stafix (Supplementary Figure S15). On the other hand, it is encouraging that for the A-RNA duplex we observed only very minor changes of helical parameters (Supplementary  Table S3). We suggest that stafix rescaling factor 0.5 (Table  1) should be sufficient to eliminate the ssRNA globules.
We also showed that OL3-stafix very efficiently eliminates the so-called intercalated conformations of the TNs, suggesting that this spurious conformation at least partly originates from the force field terms addressed by stafix. At more modest scaling factors, stafix could be of some interest for future force field development, in particular when combined with the other methods proposed to improve simulations of ssRNAs, such as gHBfix (26,36). However, its greatest merit is without any doubt with simulations of protein-RNA complexes where the unstructured ssRNA often corresponds to the RNA conformation recognized by the protein. We suggest that elimination of the compacted RNA structures from the free energy landscape in such studies is justified. Even if such non-specific self-interacting RNA structures are to a certain extent populated in the ensembles of real molecules, such RNA states are not likely to contribute to productive binding pathways, assuming the conformational capture mechanism at the initial stages of the binding. In other words, they are localized off-pathway from the binding funnel on the free energy landscape.
In the present study, the stafix modification was used with the OL3 AMBER force field (termed as OL3-stafix), but in principle it can also be combined with other nucleic acid force fields, including those for DNA. Considering the known difficulties in reparameterizing the general RNA force field (18,26,36), extending established core mul-tipurpose force fields, such as OL3, by goal-specific modifications, may allow many additional insightful simulation studies to be carried out. This is a common strategy in coarse-grained modeling that could be used also in the framework of atomistic force fields.

Future challenges and perspective
Many of our SBS trajectories revealed various non-native but long-lived protein-RNA interactions sampled by the system which prevented it from reaching native binding. This was quite visible with the SRSF1 RRM2 system (Table 1). We suspect that non-specific protein-RNA interactions, albeit to a lesser degree, might be similarly excessive as the RNA self-interactions. In fact, this is to be expected as the vdW parameters for proteins and RNAs are largely identical. Therefore, studies of more complex protein-RNA interfaces may require additional goal-specific adjustments of the methodology to destabilize off-pathway non-specific binding sites; the presently simulated systems are relatively simple protein-RNA interfaces. Also, use of enhanced sampling methods may become inevitable to better focus the SBS on the binding funnel in more complicated systems, although extreme care will have to be taken not to excessively reduce dimensionality of the binding process. In any case, it is evident that atomistic MD simulations can provide striking insights into the ensemble nature of protein-RNA binding, which is difficult to assess by other techniques.

CONCLUSIONS
In this work, we for the first time sampled the binding funnels of two protein-RNA complexes, i.e. HuR RRM3 and SRSF1 RRM2, using completely unbiased fully atomistic MD simulations. We provide structural details of the protein-RNA complex formation and binding register exchanges. The simulations furnished the first insights into the complexity and diversity of dynamic processes that can occur at protein-RNA interfaces. They underscore the importance of an ensemble-based description of protein-RNA interactions, which especially for the HuR RRM3 complex is essential to understand not only the binding process, but also properties of the partially disordered bound state. On the methodology side, our work required development of a force field modification called stafix, which eliminates RNA's tendency to spuriously self-interact. Stafix provides long-term stabilization of protein-RNA interfaces and allows simulations of the binding process. After eliminating the spurious RNA self-interactions, the simple nonpolarizable force field becomes a surprisingly powerful tool for prediction of protein-RNA interactions and complex formation mechanisms, at least for some interfaces.

DATA AVAILABILITY
The code for implementing the stafix modification is available on Zenodo (DOI: 10.5281/zenodo.7273724). The raw simulation trajectories are available from the corresponding author upon reasonable request due to the large size of the datasets.